Neural Architecture Search (NAS) is an automatic technique that can search for well-performed architectures for a specific task. Although NAS surpasses human-designed architecture in many fields, the high computational cost of architecture evaluation it requires hinders its development. A feasible solution is to directly evaluate some metrics in the initial stage of the architecture without any training. NAS without training (WOT) score is such a metric, which estimates the final trained accuracy of the architecture through the ability to distinguish different inputs in the activation layer. However, WOT score is not an atomic metric, meaning that it does not represent a fundamental indicator of the architecture. The contributions of this paper are in three folds. First, we decouple WOT into two atomic metrics which represent the distinguishing ability of the network and the number of activation units, and explore better combination rules named (Distinguishing Activation Score) DAS. We prove the correctness of decoupling theoretically and confirmed the effectiveness of the rules experimentally. Second, in order to improve the prediction accuracy of DAS to meet practical search requirements, we propose a fast training strategy. When DAS is used in combination with the fast training strategy, it yields more improvements. Third, we propose a dataset called Darts-training-bench (DTB), which fills the gap that no training states of architecture in existing datasets. Our proposed method has 1.04$\times$ - 1.56$\times$ improvements on NAS-Bench-101, Network Design Spaces, and the proposed DTB.
translated by 谷歌翻译
This paper proposes a deep recurrent Rotation Averaging Graph Optimizer (RAGO) for Multiple Rotation Averaging (MRA). Conventional optimization-based methods usually fail to produce accurate results due to corrupted and noisy relative measurements. Recent learning-based approaches regard MRA as a regression problem, while these methods are sensitive to initialization due to the gauge freedom problem. To handle these problems, we propose a learnable iterative graph optimizer minimizing a gauge-invariant cost function with an edge rectification strategy to mitigate the effect of inaccurate measurements. Our graph optimizer iteratively refines the global camera rotations by minimizing each node's single rotation objective function. Besides, our approach iteratively rectifies relative rotations to make them more consistent with the current camera orientations and observed relative rotations. Furthermore, we employ a gated recurrent unit to improve the result by tracing the temporal information of the cost graph. Our framework is a real-time learning-to-optimize rotation averaging graph optimizer with a tiny size deployed for real-world applications. RAGO outperforms previous traditional and deep methods on real-world and synthetic datasets. The code is available at https://github.com/sfu-gruvi-3dv/RAGO
translated by 谷歌翻译
Homography estimation is erroneous in the case of large-baseline due to the low image overlay and limited receptive field. To address it, we propose a progressive estimation strategy by converting large-baseline homography into multiple intermediate ones, cumulatively multiplying these intermediate items can reconstruct the initial homography. Meanwhile, a semi-supervised homography identity loss, which consists of two components: a supervised objective and an unsupervised objective, is introduced. The first supervised loss is acting to optimize intermediate homographies, while the second unsupervised one helps to estimate a large-baseline homography without photometric losses. To validate our method, we propose a large-scale dataset that covers regular and challenging scenes. Experiments show that our method achieves state-of-the-art performance in large-baseline scenes while keeping competitive performance in small-baseline scenes. Code and dataset are available at https://github.com/megvii-research/LBHomo.
translated by 谷歌翻译
We present a novel camera path optimization framework for the task of online video stabilization. Typically, a stabilization pipeline consists of three steps: motion estimating, path smoothing, and novel view rendering. Most previous methods concentrate on motion estimation, proposing various global or local motion models. In contrast, path optimization receives relatively less attention, especially in the important online setting, where no future frames are available. In this work, we adopt recent off-the-shelf high-quality deep motion models for the motion estimation to recover the camera trajectory and focus on the latter two steps. Our network takes a short 2D camera path in a sliding window as input and outputs the stabilizing warp field of the last frame in the window, which warps the coming frame to its stabilized position. A hybrid loss is well-defined to constrain the spatial and temporal consistency. In addition, we build a motion dataset that contains stable and unstable motion pairs for the training. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art online methods both qualitatively and quantitatively and achieves comparable performance to offline methods.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
基于深度学习的单图像超分辨率(SISR)方法引起了人们的关注,并在现代高级GPU上取得了巨大的成功。但是,大多数最先进的方法都需要大量参数,记忆和计算资源,这些参数通常会显示在当前移动设备CPU/NPU上时显示出较低的推理时间。在本文中,我们提出了一个简单的普通卷积网络,该网络具有快速最近的卷积模块(NCNET),该模块对NPU友好,可以实时执行可靠的超级分辨率。提出的最近的卷积具有与最近的UP采样相同的性能,但更快,更适合Android NNAPI。我们的模型可以很容易地在具有8位量化的移动设备上部署,并且与所有主要的移动AI加速器完全兼容。此外,我们对移动设备上的不同张量操作进行了全面的实验,以说明网络体系结构的效率。我们的NCNET在DIV2K 3X数据集上进行了训练和验证,并且与其他有效的SR方法的比较表明,NCNET可以实现高保真SR结果,同时使用更少的推理时间。我们的代码和预估计的模型可在\ url {https://github.com/algolzw/ncnet}上公开获得。
translated by 谷歌翻译
高动态范围(HDR)DEGHOSTING算法旨在生成具有现实细节的无幽灵HDR图像。受到接收场的局部性的限制,现有的基于CNN的方法通常容易产生大型运动和严重饱和的情况下产生鬼影和强度扭曲。在本文中,我们提出了一种新颖的背景感知视觉变压器(CA-VIT),用于无幽灵的高动态范围成像。 CA-VIT被设计为双分支结构,可以共同捕获全球和本地依赖性。具体而言,全球分支采用基于窗口的变压器编码器来建模远程对象运动和强度变化以解决hosting。对于本地分支,我们设计了局部上下文提取器(LCE)来捕获短范围的图像特征,并使用频道注意机制在提取的功能上选择信息丰富的本地详细信息,以补充全局分支。通过将CA-VIT作为基本组件纳入基本组件,我们进一步构建了HDR-Transformer,这是一个分层网络,以重建高质量的无幽灵HDR图像。在三个基准数据集上进行的广泛实验表明,我们的方法在定性和定量上优于最先进的方法,而计算预算大大降低。代码可从https://github.com/megvii-research/hdr-transformer获得
translated by 谷歌翻译
从视频中获得地面真相标签很具有挑战性,因为在像素流标签的手动注释非常昂贵且费力。此外,现有的方法试图将合成数据集的训练模型调整到真实的视频中,该视频不可避免地遭受了域差异并阻碍了现实世界应用程序的性能。为了解决这些问题,我们提出了RealFlow,这是一个基于期望最大化的框架,可以直接从任何未标记的现实视频中创建大规模的光流数据集。具体而言,我们首先估计一对视频帧之间的光流,然后根据预测流从该对中合成新图像。因此,新图像对及其相应的流可以被视为新的训练集。此外,我们设计了一种逼真的图像对渲染(RIPR)模块,该模块采用软磁性裂口和双向孔填充技术来减轻图像合成的伪像。在E-Step中,RIPR呈现新图像以创建大量培训数据。在M-Step中,我们利用生成的训练数据来训练光流网络,该数据可用于估计下一个E步骤中的光流。在迭代学习步骤中,流网络的能力逐渐提高,流量的准确性以及合成数据集的质量也是如此。实验结果表明,REALFLOW的表现优于先前的数据集生成方法。此外,基于生成的数据集,我们的方法与受监督和无监督的光流方法相比,在两个标准基准测试方面达到了最先进的性能。我们的代码和数据集可从https://github.com/megvii-research/realflow获得
translated by 谷歌翻译
本文研究了一个新的,实用但具有挑战性的问题,称为类无监督的域名适应性(CI-UDA),其中标记的源域包含所有类别,但是未标记的目标域中的类别依次增加。由于两个困难,这个问题具有挑战性。首先,源和目标标签集在每个时间步骤都不一致,这使得很难进行准确的域对齐。其次,以前的目标类在当前步骤中不可用,从而忘记了先前的知识。为了解决这个问题,我们提出了一种新型的原型引导连续适应(PROCA)方法,由两种解决方案策略组成。 1)标签原型识别:我们通过检测具有目标样本的累积预测概率的共享类来识别目标标签原型。 2)基于原型的对齐和重播:基于确定的标签原型,我们对齐域并强制执行模型以保留先前的知识。有了这两种策略,ProCA能够有效地将源模型改编为类未标记的目标域。广泛的实验证明了Proca在解决CI-UDA方面的有效性和优势。源代码可从https://github.com/hongbin98/proca.git获得
translated by 谷歌翻译
并非每个人都可以配备专业的摄影技巧和足够的拍摄时间,并且偶尔会有一些倾斜的图像。在本文中,我们提出了一项名为“旋转校正”的新的实用任务,以自动校正具有较高内容保真度的倾斜度,条件是旋转角度未知。可以轻松地将此任务集成到图像编辑应用程序中,从而使用户无需任何手动操作即可更正旋转的图像。为此,我们利用神经网络来预测可以扭曲倾斜图像的光流,以感知水平。然而,单个图像的像素光流量估计非常不稳定,尤其是在大角度倾斜图像中。为了增强其鲁棒性,我们提出了一种简单但有效的预测策略,以形成强大的弹性经纱。特别是,我们首先回归可以转化为可靠的初始光学流的网格变形。然后,我们估算残留的光流,以促进我们的网络赋予像素变形的灵活性,从而进一步纠正倾斜图像的细节。为了建立评估基准并训练学习框架,在场景和旋转角度上呈现了较大的多样性,呈现了全面的旋转校正数据集。广泛的实验表明,即使在没有角度的情况下,我们的算法也可以超越其他需要此事先的最先进的解决方案。代码和数据集将在https://github.com/nie-lang/rotationCorrection上找到。
translated by 谷歌翻译